ã³ã³ãã€ã©èšèšã®ç¬¬äžæ®µéã§ããåå¥è§£æãæ·±ãæ¢æ±ããŸããããŒã¯ã³ãã¬ãã·ãŒã ãæ£èŠè¡šçŸãæéãªãŒãããã³ãšãã®å®çšçãªå¿çšã«ã€ããŠåŠã³ãŸãã
ã³ã³ãã€ã©èšèšïŒåå¥è§£æã®åºç€
ã³ã³ãã€ã©èšèšã¯ãçŸä»£ã®ãœãããŠã§ã¢éçºã®å€ããæ¯ãããã³ã³ãã¥ãŒã¿ãµã€ãšã³ã¹ã®äžã§ãé åçãã€éèŠãªåéã§ããã³ã³ãã€ã©ã¯ã人éãèªãããœãŒã¹ã³ãŒããšæ©æ¢°ãå®è¡å¯èœãªåœä»€ãšã®éã®æ©æž¡ããããŸãããã®èšäºã§ã¯ãã³ã³ãã€ã«ããã»ã¹ã®åææ®µéã§ããåå¥è§£æã®åºç€ã«ã€ããŠè©³ããæãäžããŸãããã®ç®çãäž»èŠãªæŠå¿µããããŠææ¬²çãªã³ã³ãã€ã©èšèšè ãäžçäžã®ãœãããŠã§ã¢ãšã³ãžãã¢ã«ãšã£ãŠã®å®çšçãªæå³ãæ¢ããŸãã
åå¥è§£æãšã¯ïŒ
åå¥è§£æã¯ãã¹ãã£ãã³ã°ãããŒã¯ã³åãšããŠãç¥ãããã³ã³ãã€ã©ã®æåã®ãã§ãŒãºã§ãããã®äž»ãªæ©èœã¯ããœãŒã¹ã³ãŒããæåã®ã¹ããªãŒã ãšããŠèªã¿èŸŒã¿ãããããã¬ãã·ãŒã ïŒåå¥ïŒãšåŒã°ããæå³ã®ããã·ãŒã±ã³ã¹ã«ã°ã«ãŒãåããããšã§ããåã¬ãã·ãŒã ã¯ããã®åœ¹å²ã«åºã¥ããŠåé¡ãããçµæãšããŠããŒã¯ã³ã®ã·ãŒã±ã³ã¹ãçæãããŸããããã¯ããããªãåŠçã®ããã«ã€ã³ããããæºåãããæåã®ãœãŒããšã©ããªã³ã°ã®ããã»ã¹ãšèããããšãã§ããŸãã
äŸãã°ãx = y + 5; ãšããæããããšããŸããåå¥è§£æåšã¯ãããæ¬¡ã®ãããªããŒã¯ã³ã«åè§£ããŸãã
- èå¥å:
x - 代å
¥æŒç®å:
= - èå¥å:
y - å ç®æŒç®å:
+ - æŽæ°ãªãã©ã«:
5 - ã»ãã³ãã³:
;
åå¥è§£æåšã¯ãæ¬è³ªçã«ããã°ã©ãã³ã°èšèªã®ãããã®åºæ¬çãªæ§æèŠçŽ ãèå¥ããŸãã
åå¥è§£æã«ãããäž»èŠãªæŠå¿µ
ããŒã¯ã³ãšã¬ãã·ãŒã
åè¿°ã®éããããŒã¯ã³ã¯ã¬ãã·ãŒã ãåé¡ãã衚çŸã§ããã¬ãã·ãŒã ã¯ããœãŒã¹ã³ãŒãå ã§ããŒã¯ã³ã®ãã¿ãŒã³ã«äžèŽããå®éã®æåã·ãŒã±ã³ã¹ã§ããPythonã®æ¬¡ã®ã³ãŒãã¹ãããããèããŠã¿ãŸãããã
if x > 5:
print("x is greater than 5")
ãã®ã¹ããããããã®ããŒã¯ã³ãšã¬ãã·ãŒã ã®äŸãããã€ãæããŸãã
- ããŒã¯ã³: KEYWORD, ã¬ãã·ãŒã :
if - ããŒã¯ã³: IDENTIFIER, ã¬ãã·ãŒã :
x - ããŒã¯ã³: RELATIONAL_OPERATOR, ã¬ãã·ãŒã :
> - ããŒã¯ã³: INTEGER_LITERAL, ã¬ãã·ãŒã :
5 - ããŒã¯ã³: COLON, ã¬ãã·ãŒã :
: - ããŒã¯ã³: KEYWORD, ã¬ãã·ãŒã :
print - ããŒã¯ã³: STRING_LITERAL, ã¬ãã·ãŒã :
"x is greater than 5"
ããŒã¯ã³ã¯ã¬ãã·ãŒã ã®ã«ããŽãªã衚ããã¬ãã·ãŒã ã¯ãœãŒã¹ã³ãŒãããã®å®éã®æååã§ããã³ã³ãã€ã©ã®æ¬¡ã®æ®µéã§ããããŒãµãŒã¯ãããŒã¯ã³ã䜿çšããŠããã°ã©ã ã®æ§é ãçè§£ããŸãã
æ£èŠè¡šçŸ
æ£èŠè¡šçŸïŒregexïŒã¯ãæåã®ãã¿ãŒã³ãèšè¿°ããããã®åŒ·åã§ç°¡æœãªèšæ³ã§ããåå¥è§£æã«ãããŠãã¬ãã·ãŒã ãç¹å®ã®ããŒã¯ã³ãšããŠèªèãããããã«äžèŽããªããã°ãªããªããã¿ãŒã³ãå®çŸ©ããããã«åºã䜿çšãããŸããæ£èŠè¡šçŸã¯ãã³ã³ãã€ã©èšèšã ãã§ãªããããã¹ãåŠçãããããã¯ãŒã¯ã»ãã¥ãªãã£ãŸã§ãã³ã³ãã¥ãŒã¿ãµã€ãšã³ã¹ã®å€ãã®åéã§åºæ¬çãªæŠå¿µã§ãã
äžè¬çãªæ£èŠè¡šçŸã®èšå·ãšãã®æå³ãããã€ã玹ä»ããŸãã
.ïŒãããïŒïŒæ¹è¡ãé€ãä»»æã®1æåã«äžèŽããŸãã*ïŒã¢ã¹ã¿ãªã¹ã¯ïŒïŒçŽåã®èŠçŽ ã®0å以äžã®ç¹°ãè¿ãã«äžèŽããŸãã+ïŒãã©ã¹ïŒïŒçŽåã®èŠçŽ ã®1å以äžã®ç¹°ãè¿ãã«äžèŽããŸãã?ïŒçå笊ïŒïŒçŽåã®èŠçŽ ã®0åãŸãã¯1åã®åºçŸã«äžèŽããŸãã[]ïŒè§æ¬åŒ§ïŒïŒæåã¯ã©ã¹ãå®çŸ©ããŸããäŸãã°ã[a-z]ã¯ä»»æã®å°æåã¢ã«ãã¡ãããã«äžèŽããŸãã[^]ïŒåŠå®ã®è§æ¬åŒ§ïŒïŒåŠå®æåã¯ã©ã¹ãå®çŸ©ããŸããäŸãã°ã[^0-9]ã¯æ°å以å€ã®ä»»æã®æåã«äžèŽããŸãã|ïŒãã€ãïŒïŒéžæïŒORïŒã衚ããŸããäŸãã°ãa|bã¯`a`ãŸãã¯`b`ã®ããããã«äžèŽããŸãã()ïŒäžžæ¬åŒ§ïŒïŒèŠçŽ ãã°ã«ãŒãåãããã£ããã£ããŸãã\ïŒããã¯ã¹ã©ãã·ã¥ïŒïŒç¹æ®æåããšã¹ã±ãŒãããŸããäŸãã°ã\.ã¯ãªãã©ã«ã®ãããã«äžèŽããŸãã
æ£èŠè¡šçŸãããŒã¯ã³ãå®çŸ©ããããã«ã©ã®ããã«äœ¿çšã§ãããã®äŸãããã€ãèŠãŠã¿ãŸãããã
- æŽæ°ãªãã©ã«ïŒ
[0-9]+ïŒ1ã€ä»¥äžã®æ°åïŒ - èå¥åïŒ
[a-zA-Z_][a-zA-Z0-9_]*ïŒæåãŸãã¯ã¢ã³ããŒã¹ã³ã¢ã§å§ãŸãããã®åŸã«0å以äžã®æåãæ°åããŸãã¯ã¢ã³ããŒã¹ã³ã¢ãç¶ãïŒ - æµ®åå°æ°ç¹ãªãã©ã«ïŒ
[0-9]+\.[0-9]+ïŒ1ã€ä»¥äžã®æ°åããã®åŸã«ãããããã®åŸã«1ã€ä»¥äžã®æ°åãç¶ãïŒ ããã¯ç°¡ç¥åãããäŸã§ããããå ç¢ãªæ£èŠè¡šçŸã¯ææ°ããªãã·ã§ã³ã®ç¬Šå·ãæ±ããŸãã
ããã°ã©ãã³ã°èšèªã«ãã£ãŠãèå¥åãæŽæ°ãªãã©ã«ããã®ä»ã®ããŒã¯ã³ã®ã«ãŒã«ãç°ãªãå ŽåããããŸãããã®ããã察å¿ããæ£èŠè¡šçŸãããã«åãããŠèª¿æŽããå¿ èŠããããŸããäŸãã°ãäžéšã®èšèªã§ã¯èå¥åã«Unicodeæåã䜿çšã§ãããããããè€éãªæ£èŠè¡šçŸãå¿ èŠã«ãªããŸãã
æéãªãŒãããã³
æéãªãŒãããã³ïŒFAïŒã¯ãæ£èŠè¡šçŸã«ãã£ãŠå®çŸ©ããããã¿ãŒã³ãèªèããããã«äœ¿çšãããæœè±¡çãªæ©æ¢°ã§ãããããã¯åå¥è§£æåšã®å®è£ ã«ãããäžå¿çãªæŠå¿µã§ããæéãªãŒãããã³ã«ã¯äž»ã«2ã€ã®ã¿ã€ãããããŸãã
- 決å®çæéãªãŒãããã³ïŒDFAïŒïŒåç¶æ ãšå ¥åã·ã³ãã«ã«å¯ŸããŠãå¥ã®ç¶æ ãžã®é·ç§»ãã¡ããã©1ã€ååšããŸããDFAã¯å®è£ ãšå®è¡ã容æã§ãããæ£èŠè¡šçŸããçŽæ¥æ§ç¯ããã®ã¯ããè€éã«ãªãããšããããŸãã
- éæ±ºå®çæéãªãŒãããã³ïŒNFAïŒïŒåç¶æ ãšå ¥åã·ã³ãã«ã«å¯ŸããŠãä»ã®ç¶æ ãžã®é·ç§»ã0ã1ããŸãã¯è€æ°ååšããå ŽåããããŸããNFAã¯æ£èŠè¡šçŸããæ§ç¯ããã®ã容æã§ãããããè€éãªå®è¡ã¢ã«ãŽãªãºã ãå¿ èŠã§ãã
åå¥è§£æã«ãããå žåçãªããã»ã¹ã¯æ¬¡ã®éãã§ãã
- åããŒã¯ã³ã¿ã€ãã®æ£èŠè¡šçŸãNFAã«å€æããã
- NFAãDFAã«å€æããã
- DFAãããŒãã«é§åã®ã¹ãã£ããšããŠå®è£ ããã
ãã®åŸãDFAã¯å ¥åã¹ããªãŒã ãã¹ãã£ã³ããŠããŒã¯ã³ãèå¥ããããã«äœ¿çšãããŸããDFAã¯åæç¶æ ããå§ãŸããå ¥åã1æåãã€èªã¿åããŸããçŸåšã®ç¶æ ãšå ¥åæåã«åºã¥ããŠãæ°ããç¶æ ã«é·ç§»ããŸããæåã®ã·ãŒã±ã³ã¹ãèªã¿åã£ãåŸã«DFAãåçç¶æ ã«éããå Žåããã®ã·ãŒã±ã³ã¹ã¯ã¬ãã·ãŒã ãšããŠèªèããã察å¿ããããŒã¯ã³ãçæãããŸãã
åå¥è§£æã®ä»çµã¿
åå¥è§£æåšã¯æ¬¡ã®ããã«åäœããŸãã
- ãœãŒã¹ã³ãŒãã®èªã¿èŸŒã¿ïŒã¬ããµãŒã¯å ¥åãã¡ã€ã«ãŸãã¯ã¹ããªãŒã ãããœãŒã¹ã³ãŒãã1æåãã€èªã¿èŸŒã¿ãŸãã
- ã¬ãã·ãŒã ã®èå¥ïŒã¬ããµãŒã¯æ£èŠè¡šçŸïŒããæ£ç¢ºã«ã¯ãæ£èŠè¡šçŸããå°åºãããDFAïŒã䜿çšããŠãæå¹ãªã¬ãã·ãŒã ã圢æããæåã·ãŒã±ã³ã¹ãèå¥ããŸãã
- ããŒã¯ã³ã®çæïŒèŠã€ãã£ãåã¬ãã·ãŒã ã«å¯ŸããŠãã¬ããµãŒã¯ã¬ãã·ãŒã èªäœãšãã®ããŒã¯ã³ã¿ã€ãïŒäŸïŒIDENTIFIER, INTEGER_LITERAL, OPERATORïŒãå«ãããŒã¯ã³ãäœæããŸãã
- ãšã©ãŒã®åŠçïŒã¬ããµãŒãå®çŸ©ãããã©ã®ãã¿ãŒã³ã«ãäžèŽããªãæåã·ãŒã±ã³ã¹ïŒã€ãŸããããŒã¯ã³åã§ããªãïŒã«ééããå Žåãåå¥ãšã©ãŒãå ±åããŸããããã«ã¯ãç¡å¹ãªæåãäžé©åã«åœ¢æãããèå¥åãå«ãŸããå ŽåããããŸãã
- ããŒãµãŒãžã®ããŒã¯ã³åãæž¡ãïŒã¬ããµãŒã¯ããŒã¯ã³ã®ã¹ããªãŒã ãã³ã³ãã€ã©ã®æ¬¡ã®ãã§ãŒãºã§ããããŒãµãŒã«æž¡ããŸãã
ãã®åçŽãªCèšèªã®ã³ãŒãã¹ãããããèããŠã¿ãŸãããã
int main() {
int x = 10;
return 0;
}
åå¥è§£æåšã¯ãã®ã³ãŒããåŠçããæ¬¡ã®ãããªããŒã¯ã³ãïŒç°¡ç¥åããŠïŒçæããŸãã
- KEYWORD:
int - IDENTIFIER:
main - LEFT_PAREN:
( - RIGHT_PAREN:
) - LEFT_BRACE:
{ - KEYWORD:
int - IDENTIFIER:
x - ASSIGNMENT_OPERATOR:
= - INTEGER_LITERAL:
10 - SEMICOLON:
; - KEYWORD:
return - INTEGER_LITERAL:
0 - SEMICOLON:
; - RIGHT_BRACE:
}
åå¥è§£æåšã®å®è£
åå¥è§£æåšãå®è£ ããã«ã¯ãäž»ã«2ã€ã®ã¢ãããŒãããããŸãã
- æåå®è£ ïŒã¬ããµãŒã®ã³ãŒããæã§æžãæ¹æ³ãããã«ãããããé«åºŠãªå¶åŸ¡ãšæé©åã®å¯èœæ§ãåŸãããŸãããæéããããããšã©ãŒãçºçãããããªããŸãã
- ã¬ããµãŒãžã§ãã¬ãŒã¿ã®äœ¿çšïŒLexïŒFlexïŒãANTLRãJFlexãªã©ã®ããŒã«ãå©çšããæ¹æ³ããããã®ããŒã«ã¯ãæ£èŠè¡šçŸã®ä»æ§ã«åºã¥ããŠã¬ããµãŒã³ãŒããèªåçã«çæããŸãã
æåå®è£
æåå®è£ ã¯ãéåžžãç¶æ æ©æ¢°ïŒDFAïŒãäœæããå ¥åæåã«åºã¥ããŠç¶æ éãé·ç§»ããã³ãŒããæžãããšãå«ã¿ãŸãããã®ã¢ãããŒãã«ãããåå¥è§£æããã»ã¹ã现ããå¶åŸ¡ã§ããç¹å®ã®ããã©ãŒãã³ã¹èŠä»¶ã«åãããŠæé©åã§ããŸããããããæ£èŠè¡šçŸãšæéãªãŒãããã³ã«é¢ããæ·±ãçè§£ãå¿ èŠã§ãããä¿å®ãšãããã°ãå°é£ã«ãªãå¯èœæ§ããããŸãã
以äžã¯ãæåã¬ããµãŒãPythonã§æŽæ°ãªãã©ã«ãã©ã®ããã«åŠçãããã®æŠå¿µçãªïŒãããŠéåžžã«ç°¡ç¥åãããïŒäŸã§ãã
def lexer(input_string):
tokens = []
i = 0
while i < len(input_string):
if input_string[i].isdigit():
# Found a digit, start building the integer
num_str = ""
while i < len(input_string) and input_string[i].isdigit():
num_str += input_string[i]
i += 1
tokens.append(("INTEGER", int(num_str)))
i -= 1 # Correct for the last increment
elif input_string[i] == '+':
tokens.append(("PLUS", "+"))
elif input_string[i] == '-':
tokens.append(("MINUS", "-"))
# ... (handle other characters and tokens)
i += 1
return tokens
ããã¯åæ©çãªäŸã§ãããæåã§å ¥åæååãèªã¿åããæåãã¿ãŒã³ã«åºã¥ããŠããŒã¯ã³ãèå¥ãããšããåºæ¬çãªèãæ¹ã瀺ããŠããŸãã
ã¬ããµãŒãžã§ãã¬ãŒã¿
ã¬ããµãŒãžã§ãã¬ãŒã¿ã¯ãåå¥è§£æåšã®äœæããã»ã¹ãèªååããããŒã«ã§ãããããã¯ãåããŒã¯ã³ã¿ã€ãã®æ£èŠè¡šçŸãšãããŒã¯ã³ãèªèããããšãã«å®è¡ãããã¢ã¯ã·ã§ã³ãå®çŸ©ãã仿§ãã¡ã€ã«ãå ¥åãšããŠåãåããŸãããžã§ãã¬ãŒã¿ã¯ãã¿ãŒã²ããã®ããã°ã©ãã³ã°èšèªã§ã¬ããµãŒã³ãŒããçæããŸãã
人æ°ã®ããã¬ããµãŒãžã§ãã¬ãŒã¿ãããã€ã玹ä»ããŸãã
- Lex (Flex)ïŒããŒãµãŒãžã§ãã¬ãŒã¿ã§ããYacc (Bison)ãšå ±ã«ãã䜿ããããåºãå©çšãããŠããã¬ããµãŒãžã§ãã¬ãŒã¿ãFlexã¯ãã®é床ãšå¹çæ§ã§ç¥ãããŠããŸãã
- ANTLR (ANother Tool for Language Recognition)ïŒã¬ããµãŒãžã§ãã¬ãŒã¿ãå«ã匷åãªããŒãµãŒãžã§ãã¬ãŒã¿ãANTLRã¯å¹ åºãããã°ã©ãã³ã°èšèªããµããŒãããè€éãªææ³ãã¬ããµãŒã®äœæãå¯èœã«ããŸãã
- JFlexïŒJavaå°çšã«èšèšãããã¬ããµãŒãžã§ãã¬ãŒã¿ãJFlexã¯å¹ççã§é«åºŠã«ã«ã¹ã¿ãã€ãºå¯èœãªã¬ããµãŒãçæããŸãã
ã¬ããµãŒãžã§ãã¬ãŒã¿ã䜿çšããããšã«ã¯ãããã€ãã®å©ç¹ããããŸãã
- éçºæéã®ççž®ïŒã¬ããµãŒãžã§ãã¬ãŒã¿ã¯ãåå¥è§£æåšã®éçºã«å¿ èŠãªæéãšåŽåãå€§å¹ ã«åæžããŸãã
- 粟床ã®åäžïŒã¬ããµãŒãžã§ãã¬ãŒã¿ã¯ãæç¢ºã«å®çŸ©ãããæ£èŠè¡šçŸã«åºã¥ããŠã¬ããµãŒãçæããããããšã©ãŒã®ãªã¹ã¯ãäœæžããŸãã
- ä¿å®æ§ïŒã¬ããµãŒã®ä»æ§ã¯ãéåžžãææžãã®ã³ãŒããããèªã¿ããããä¿å®ã容æã§ãã
- ããã©ãŒãã³ã¹ïŒçŸä»£ã®ã¬ããµãŒãžã§ãã¬ãŒã¿ã¯ãåªããããã©ãŒãã³ã¹ãéæã§ããé«åºŠã«æé©åãããã¬ããµãŒãçæããŸãã
以äžã¯ãæŽæ°ãšèå¥åãèªèããããã®ç°¡åãªFlex仿§ã®äŸã§ãã
%%
[0-9]+ { printf("INTEGER: %s\n", yytext); }
[a-zA-Z_][a-zA-Z0-9_]* { printf("IDENTIFIER: %s\n", yytext); }
[ \t\n]+ ; // Ignore whitespace
. { printf("ILLEGAL CHARACTER: %s\n", yytext); }
%%
ãã®ä»æ§ã¯ãæŽæ°çšãšèå¥åçšã®2ã€ã®ã«ãŒã«ãå®çŸ©ããŠããŸããFlexããã®ä»æ§ãåŠçãããšããããã®ããŒã¯ã³ãèªèããã¬ããµãŒã®Cã³ãŒããçæãããŸããyytext倿°ã«ã¯ãäžèŽããã¬ãã·ãŒã ãå«ãŸããŸãã
åå¥è§£æã«ããããšã©ãŒåŠç
ãšã©ãŒåŠçã¯ãåå¥è§£æã®éèŠãªåŽé¢ã§ããã¬ããµãŒãç¡å¹ãªæåãäžé©åãªåœ¢åŒã®ã¬ãã·ãŒã ã«ééããå ŽåããŠãŒã¶ãŒã«ãšã©ãŒãå ±åããå¿ èŠããããŸããäžè¬çãªåå¥ãšã©ãŒã«ã¯æ¬¡ã®ãããªãã®ããããŸãã
- ç¡å¹ãªæåïŒèšèªã®ã¢ã«ãã¡ãããã«å«ãŸããŠããªãæåïŒäŸïŒèå¥åã§ã®äœ¿çšãèš±å¯ãããŠããªãèšèªã§ã®
$èšå·ïŒã - çµç«¯ãããŠããªãæååïŒå¯Ÿå¿ããåŒçšç¬Šã§éããããŠããªãæååã
- ç¡å¹ãªæ°å€ïŒé©åã«åœ¢æãããŠããªãæ°å€ïŒäŸïŒè€æ°ã®å°æ°ç¹ãæã€æ°å€ïŒã
- æå€§é·ã®è¶ éïŒèš±å¯ãããæå€§é·ãè¶ ããèå¥åãæååãªãã©ã«ã
åå¥ãšã©ãŒãæ€åºãããå Žåãã¬ããµãŒã¯æ¬¡ã®ããšãè¡ãã¹ãã§ãã
- ãšã©ãŒã®å ±åïŒãšã©ãŒãçºçããè¡çªå·ãšåçªå·ãããã³ãšã©ãŒã®èª¬æãå«ããšã©ãŒã¡ãã»ãŒãžãçæããŸãã
- å埩ã®è©Šã¿ïŒãšã©ãŒããå埩ããå ¥åã®ã¹ãã£ã³ãç¶ç¶ããããšããŸããããã«ã¯ãç¡å¹ãªæåãã¹ãããããããçŸåšã®ããŒã¯ã³ãçµäºããããããããšãå«ãŸããå ŽåããããŸããç®æšã¯ãé£éçãªãšã©ãŒãé¿ãããŠãŒã¶ãŒã«ã§ããã ãå€ãã®æ å ±ãæäŸããããšã§ãã
ãšã©ãŒã¡ãã»ãŒãžã¯æç¢ºã§æçã§ããã¹ãã§ãããã°ã©ããŒãåé¡ãè¿
éã«ç¹å®ãä¿®æ£ããã®ã«åœ¹ç«ã€ãã®ã§ãªããã°ãªããŸãããäŸãã°ãçµç«¯ãããŠããªãæååã«å¯Ÿããè¯ããšã©ãŒã¡ãã»ãŒãžã¯ãããšã©ãŒïŒ10è¡ç®ã25åç®ã§çµç«¯ãããŠããªãæååãªãã©ã«ããããŸããã®ããã«ãªããŸãã
ã³ã³ãã€ã«ããã»ã¹ã«ãããåå¥è§£æã®åœ¹å²
åå¥è§£æã¯ãã³ã³ãã€ã«ããã»ã¹ã«ãããéèŠãªæåã®ã¹ãããã§ãããã®åºåã§ããããŒã¯ã³ã®ã¹ããªãŒã ã¯ã次ã®ãã§ãŒãºã§ããããŒãµãŒïŒæ§æè§£æåšïŒã®å ¥åãšããŠæ©èœããŸããããŒãµãŒã¯ããŒã¯ã³ã䜿çšããŠãããã°ã©ã ã®ææ³æ§é ãè¡šãæœè±¡æ§ææšïŒASTïŒãæ§ç¯ããŸããæ£ç¢ºã§ä¿¡é Œæ§ã®é«ãåå¥è§£æããªããã°ãããŒãµãŒã¯ãœãŒã¹ã³ãŒããæ£ããè§£éããããšãã§ããŸããã
åå¥è§£æãšæ§æè§£æã®é¢ä¿ã¯ã次ã®ããã«èŠçŽã§ããŸãã
- åå¥è§£æïŒãœãŒã¹ã³ãŒããããŒã¯ã³ã®ã¹ããªãŒã ã«åå²ããã
- æ§æè§£æïŒããŒã¯ã³ã¹ããªãŒã ã®æ§é ãåæããæœè±¡æ§ææšïŒASTïŒãæ§ç¯ããã
ASTã¯ãã®åŸãæå³è§£æãäžéã³ãŒãçæãã³ãŒãæé©åãªã©ãã³ã³ãã€ã©ã®ç¶ã®ãã§ãŒãºã§äœ¿çšãããæçµçãªå®è¡å¯èœã³ãŒããçæãããŸãã
åå¥è§£æã®é«åºŠãªãããã¯
ãã®èšäºã§ã¯åå¥è§£æã®åºç€ãæ±ããŸããããããã«æ¢æ±ãã䟡å€ã®ããããã€ãã®é«åºŠãªãããã¯ããããŸãã
- UnicodeãµããŒãïŒèå¥åãæååãªãã©ã«ã«ãããUnicodeæåã®åŠçãããã«ã¯ãããè€éãªæ£èŠè¡šçŸãšæåå顿è¡ãå¿ èŠã§ãã
- åã蟌ã¿èšèªã®åå¥è§£æïŒä»ã®èšèªå ã«åã蟌ãŸããèšèªïŒäŸïŒJavaã«åã蟌ãŸããSQLïŒã®åå¥è§£æãããã«ã¯ãã³ã³ããã¹ãã«åºã¥ããŠç°ãªãã¬ããµãŒãåãæ¿ããããšããã°ãã°å«ãŸããŸãã
- ã€ã³ã¯ãªã¡ã³ã¿ã«åå¥è§£æïŒãœãŒã¹ã³ãŒãã®å€æŽãããéšåã®ã¿ãå¹ççã«åã¹ãã£ã³ã§ããåå¥è§£æãããã¯ã察話åã®éçºç°å¢ã§åœ¹ç«ã¡ãŸãã
- æèäŸåã®åå¥è§£æïŒããŒã¯ã³ã®ã¿ã€ããåšå²ã®æèã«äŸåããåå¥è§£æãããã¯ãèšèªã®æ§æã«ãããææ§ããåŠçããããã«äœ¿çšã§ããŸãã
åœéåã«é¢ããèæ ®äºé
ã°ããŒãã«ãªäœ¿çšãæå³ããèšèªã®ã³ã³ãã€ã©ãèšèšããéã«ã¯ãåå¥è§£æã«é¢ããŠæ¬¡ã®åœéåã®åŽé¢ãèæ ®ããŠãã ããã
- æåãšã³ã³ãŒãã£ã³ã°ïŒããŸããŸãªã¢ã«ãã¡ããããæåã»ãããæ±ãããã®ãåçš®æåãšã³ã³ãŒãã£ã³ã°ïŒUTF-8ãUTF-16ãªã©ïŒã®ãµããŒãã
- ãã±ãŒã«åºæã®æžåŒèšå®ïŒãã±ãŒã«åºæã®æ°å€ãæ¥ä»ã®æžåŒãæ±ãããšãäŸãã°ãäžéšã®ãã±ãŒã«ã§ã¯å°æ°ç¹ã®åºåãæåãããªãªãïŒ
.ïŒã§ã¯ãªãã«ã³ãïŒ,ïŒã§ããå ŽåããããŸãã - Unicodeæ£èŠåïŒäžè²«ããæ¯èŒãšãããã³ã°ãä¿èšŒããããã«Unicodeæååãæ£èŠåããããšã
åœéåãé©åã«åŠçããªããšãç°ãªãèšèªã§æžãããããããã¯ç°ãªãæåã»ããã䜿çšãããœãŒã¹ã³ãŒããæ±ãéã«ãäžæ£ç¢ºãªããŒã¯ã³åãã³ã³ãã€ã«ãšã©ãŒã«ã€ãªããå¯èœæ§ããããŸãã
çµè«
åå¥è§£æã¯ã³ã³ãã€ã©èšèšã®åºæ¬çãªåŽé¢ã§ãããã®èšäºã§èª¬æããæŠå¿µãæ·±ãçè§£ããããšã¯ãã³ã³ãã€ã©ãã€ã³ã¿ããªã¿ããã®ä»ã®èšèªåŠçããŒã«ãäœæããããæ±ã£ãããããã¹ãŠã®äººã«ãšã£ãŠäžå¯æ¬ ã§ããããŒã¯ã³ãšã¬ãã·ãŒã ã®çè§£ãããæ£èŠè¡šçŸãšæéãªãŒãããã³ã®ç¿åŸãŸã§ãåå¥è§£æã®ç¥èã¯ãã³ã³ãã€ã©æ§ç¯ã®äžçãããã«æ¢æ±ããããã®åŒ·åãªåºç€ãæäŸããŸããã¬ããµãŒãžã§ãã¬ãŒã¿ã掻çšããåœéåã®åŽé¢ãèæ ®ããããšã§ãéçºè ã¯å¹ åºãããã°ã©ãã³ã°èšèªããã©ãããã©ãŒã ã«å¯Ÿå¿ãããå ç¢ã§å¹ççãªåå¥è§£æåšãäœæã§ããŸãããœãããŠã§ã¢éçºãé²åãç¶ããäžã§ãåå¥è§£æã®ååã¯ãäžçäžã®èšèªåŠçæè¡ã®ç€ã§ããç¶ããã§ãããã